Skip to content

[feature](be) Add adaptive batch size for scan path (#62835)#63005

Merged
yiguolei merged 2 commits into
apache:branch-4.1from
mrhhsg:pick_abs
May 11, 2026
Merged

[feature](be) Add adaptive batch size for scan path (#62835)#63005
yiguolei merged 2 commits into
apache:branch-4.1from
mrhhsg:pick_abs

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 6, 2026

Pick PR: #62835

Problem Summary: Add adaptive block row prediction for SegmentIterator, OLAP scan, file scan, and format readers. The scan path now uses a row ceiling plus preferred output byte budget to reduce oversized blocks for wide rows while preserving row-limited behavior for narrow rows. This commit also introduces the shared session/config/thrift/runtime budget plumbing used by later operators.

Adds adaptive batch size controls for scan output blocks: preferred_block_size_bytes and preferred_max_column_in_block_size_bytes.

  • Test: Unit Test
  • Unit Test: ./run-be-ut.sh --run --filter=BlockBudgetTest.:RuntimeStateBatchSizeTest.:RuntimeStateBlockSizeBytesTest.:RuntimeStateMaxColBytesTest.:MockRuntimeStateBlockBudgetTest.:AdaptiveBlockSizePredictorTest.:BlockReaderBatchMaxRowsTest.:EstimateCollectedEnoughTest.:CollectedEnoughWithColumnsTest.:BlockReaderByteBudgetTest.:SegmentColumnRawDataBytesTest.:CsvReaderSetBatchSizeTest.:NewJsonReaderSetBatchSizeTest.:OrcReaderTest.:TableFormatReaderTest.:ProfileSpecTest.:LocalExchangerTest.*
  • Behavior changed: Yes (scan output block sizing can now be byte-budget limited when adaptive batch size is enabled)
  • Does this need documentation: Yes

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
  • This is a refactor/code format and no logic has been changed.
    - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason

  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
  • Yes.

  • Confirm the release note

  • Confirm test cases

  • Confirm document

  • Add branch pick label


What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@mrhhsg mrhhsg requested a review from yiguolei as a code owner May 6, 2026 06:43
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 6, 2026

run buildall

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 9, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 83.33% (5/6) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 86.06% (358/416) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.33% (27069/36913)
Line Coverage 56.86% (291632/512884)
Region Coverage 54.16% (242731/448210)
Branch Coverage 55.89% (105564/188893)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 85.58% (356/416) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.51% (26402/36922)
Line Coverage 54.44% (279470/513332)
Region Coverage 51.53% (231282/448864)
Branch Coverage 53.03% (100273/189092)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 85.58% (356/416) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.50% (26398/36922)
Line Coverage 54.44% (279462/513332)
Region Coverage 51.54% (231338/448864)
Branch Coverage 53.03% (100267/189092)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 85.58% (356/416) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.50% (26398/36922)
Line Coverage 54.44% (279462/513332)
Region Coverage 51.54% (231338/448864)
Branch Coverage 53.03% (100267/189092)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

Issue Number: None

Related PR: None

Problem Summary: Add adaptive block row prediction for SegmentIterator,
OLAP scan, file scan, and format readers. The scan path now uses a row
ceiling plus preferred output byte budget to reduce oversized blocks for
wide rows while preserving row-limited behavior for narrow rows. This
commit also introduces the shared session/config/thrift/runtime budget
plumbing used by later operators.

Adds adaptive batch size controls for scan output blocks:
preferred_block_size_bytes and preferred_max_column_in_block_size_bytes.

- Test: Unit Test
- Unit Test: ./run-be-ut.sh --run
--filter=BlockBudgetTest.*:RuntimeStateBatchSizeTest.*:RuntimeStateBlockSizeBytesTest.*:RuntimeStateMaxColBytesTest.*:MockRuntimeStateBlockBudgetTest.*:AdaptiveBlockSizePredictorTest.*:BlockReaderBatchMaxRowsTest.*:EstimateCollectedEnoughTest.*:CollectedEnoughWithColumnsTest.*:BlockReaderByteBudgetTest.*:SegmentColumnRawDataBytesTest.*:CsvReaderSetBatchSizeTest.*:NewJsonReaderSetBatchSizeTest.*:OrcReaderTest.*:TableFormatReaderTest.*:ProfileSpecTest.*:LocalExchangerTest.*
- Behavior changed: Yes (scan output block sizing can now be byte-budget
limited when adaptive batch size is enabled)
- Does this need documentation: Yes

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mrhhsg mrhhsg force-pushed the pick_abs branch 2 times, most recently from 7b81191 to af17679 Compare May 10, 2026 14:02
Issue Number: None

Related PR: None

Problem Summary: Cluster-key MOW compaction sorts rows by cluster key, so duplicate unique keys may be non-adjacent and can remain visible in the output rowset. Scan the output rowset primary key index after compaction and add output-rowset internal delete bitmap entries for older duplicate unique-key rows.

None

- Test: Unit Test
    - Ran ./run-be-ut.sh --run --filter=VerticalCompactionTest.ClusterKeyMowCompactionNeedsOutputRowsetInternalDedup -j 8
- Behavior changed: No
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (4/6) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 83.74% (546/652) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.46% (26384/36919)
Line Coverage 54.54% (280001/513416)
Region Coverage 51.80% (232630/449067)
Branch Coverage 53.15% (100563/189200)

@yiguolei yiguolei closed this May 11, 2026
@yiguolei yiguolei reopened this May 11, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

skip buildall

@yiguolei yiguolei merged commit 83d9e70 into apache:branch-4.1 May 11, 2026
46 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants